Text Normalization System for Bangla

نویسنده

  • Firoj Alam
چکیده

This paper describes a process of text normalization system for the Bangla language (exonym: Bengali) by identifying the semiotic classes from Bangla text corpus. After identifying the semiotic classes, a set of rules was written for tokenization and verbalization. This study is important for Text-ToSpeech (TTS) system and as well as for creating a language model used in speech recognition.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Development of annotated Bangla speech corpora

This paper describes the development procedure of three different Bangla read speech corpora which can be used for phonetic research and developing speech applications. Several criteria were maintained in the corpora development process that includes considering the phonetic and prosodic features during text selection. On the other hand, a specification was maintained in the recording phase as ...

متن کامل

Building Statistical Parametric Multi-speaker Synthesis for Bangladeshi Bangla

We present a text-to-speech (TTS) system designed for the dialect of Bengali spoken in Bangladesh. This work is part of an ongoing effort to address the needs of new under-resourced languages. We propose a process for streamlining the bootstrapping of TTS systems for under-resourced languages. First, we use crowdsourcing to collect the data from multiple ordinary speakers, each speaker recordin...

متن کامل

TTS for Low Resource Languages: A Bangla Synthesizer

We present a text-to-speech (TTS) system designed for the dialect of Bengali spoken in Bangladesh. This work is part of an ongoing effort to address the needs of under-resourced languages. We propose a process for streamlining the bootstrapping of TTS systems for under-resourced languages. First, we use crowdsourcing to collect the data from multiple ordinary speakers, each speaker recording sm...

متن کامل

Automatic Extraction of Compound Verbs from Bangla Corpora

In this paper we present a rule-based technique for the automatic extraction of Bangla compound verbs from raw text corpora. In our work we have (a) proposed rules through which a system could automatically identify Bangla CVs from texts. These rules will be established on the basis of syntactic interpretation of sentences, (b) we shall explain problems of CV identification subject to the seman...

متن کامل

Bangla Text to Speech using Festival

This paper describes the development of the first, usable, open source and freely available Bangla Text to Speech (TTS) system for Bangladeshi Bangla using the open source Festival TTS engine. Besides that, this paper also discusses a few practical applications that use this system. This system is developed using diphone concatenation approach in its waveform generation phase. Construction of a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008